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Abstract 


Clustering methods have been used extensively in computer vision and pattern recognition. 
Fuzzy clustering has been shown to be advantageous over crisp (or traditional) clustering in that 
total commitment of a vector to a given class is not required at each iteration. Recently fuzzy 
clustering methods have shown spectacular ability to detect not only hypervolume clusters, but also 
clusters which are actually "thin shells", i.e., curves and surfaces. Most analytic fuzzy clustering 
approaches are derived from Bezdek's Fuzzy C-Means (FCM) algorithm. The FCM uses the 
probabilistic constraint that the memberships of a data point across classes sum to one. This 
constraint was used to generate the membership update equations for an iterative algorithm. 
Unfortunately, the memberships resulting from FCM and its derivatives do not correspond to the 
intuitive concept of degree of belonging, and moreover, the algorithms have considerable trouble in 
noisy environments. Recently, we cast the clustering problem into the framework of possibility 
theory. Our approach was radically different from the existing clustering methods in that the 
resulting partition of the data can be interpreted as a possibilistic partition, and the membership 
values may be interpreted as degrees of possibility of the points belonging to the classes. We 
constructed an appropriate objective function whose minimum will characterize a good possibilistic 
partition of the data, and we derived the membership and prototype update equations from 
necessary conditions for minimization of our criterion function. In this paper, we show the ability 
of this approach to detect linear and quartic curves in the presence of considerable noise. 
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I. Introduction 


Clustering has long been a popular approach to unsupervised pattern recognition. It has 
become more attractive with the connection to neural networks, and with the increased attention to 
tuzzy clustering. In fact, recent advances in fuzzy clustering have shown spectacular ability to 
detect not only hypervolume clusters, but also clusters which arc actually "thin shells", i.e., curves 
and surfaces [1-7]. One ot the major lactors that influences the deterni ination of appropriate groups 
ot points is the distance measure chosen for the problem at hand. Fuzzy clustering has been 
shown to be advantageous over crisp (or traditional) clustering in that total commitment of a vector 
to a given class is not required at each iteration. 

Boundary detection and surface approximation are important components of intermediate- 
level vision. They are the first step in solving problems such as object recognition and orientation 
estimation. Recently, it has been shown that these problems can be viewed as clustering problems 
with appropriate distance measures and prototypes [1-7]. Dave's Fuzzy C Shells (FCS) algorithm 
[2] and the Fuzzy Adaptive C-Shells (FACS) algorithm [7] have proven to be successful in 
detecting clusters that can be described by circular arcs, or more generally by elliptical shapes 
Unfortunately, these algorithms are computationally rather intensive since they involve the solution 
ot coupled nonlinear equations for the shell (prototype) parameters. These algorithms also assume 
that the number of clusters are known. To overcome these drawbacks we recently proposed a 
computationally simpler Fuzzy C Spherical Shells (FCSS) algorithm [6] for clustering 
hyperspherical shells and suggested an efficient algorithm to determine the number of clusters 
when this is not known. We also proposed the Fuzzy C Quadric Shells (FCQS) algorithm [5] 
which can detect more general quadric shapes. One problem with the FCQS algorithm is that it 
uses the algebraic distance, which is highly nonlinear. This results in unsatisfactory performance 
when the data is not very clean [7J. Finally, none of the algorithms can handle situations in 
which the clusters include lines/planes and there is much noise. In [8], we addressed those issues 
in a new approach called Piano-Quadric Clustering. In this paper, we show how that algorithm, 
coupled with our new possibilistic clustering, can accurately find linear and quadric curves in the 
presence of noise. 

Most analytic luzzy clustering approaches are derived from Bezdek's Fuzzy C-Means 
(FCM) algorithm [9], The FCM uses the probabilistic constraint that the memberships of a data 
point across classes must sum to one. This constraint came from generalizing a crisp C-Partition of 
a data set, and was used to generate the membership update equations for an iterative algorithm. 
These equations emerge as necessary conditions for a global minimum of a least-squares type of 
criterion function. Unfortunately, the resulting memberships do not represent one's intuitive notion 
of degrees of belonging, i. e., they do not represent degrees of "typicality" or "possibility". 

There is another important motivation for using possibilistic memberships. Like all 
unsupervised techniques, clustering (crisp or fuzzy) suffers from the presence of noise in the data. 
Since most distance functions are geometric in nature, noise points, which are often quite distant 
from the primary clusters, can drastically influence the estimates of the class prototypes, and 
hence, the final clustering. Fuzzy methods ameliorate this problem when the number of classes is 
greater than one, since the noise points tend to have somewhat smaller membership values in all the 
classes. However, this difficulty still remains in the fuzzy case, since the memberships of 
unrepresentative (or noise) points can still be significantly high. In fact, if there is only one real 
cluster present in the data, there is essentially no difference between the crisp and fuzzy methods. 

On the other hand, it a set ot feature vectors is thought ot as the domain of discourse for a 
collection of independent fuzzy subsets, then there should be no constraint on the sum of the 
memberships. The only real constraint is that the assignments do really represent fuzzy 
membership values, i.e., they must lie in the interval [0,1]. In [10], we cast the clustering problem 
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into the framework of possibility theory. We briefly review this approach, and show its 
superiority to recognize shapes from noisy and incomplete data. 

II. Possibilistic Clustering Algorithms 

The original FCM formulation minimizes the objective function given by 
C N C 

J ( L,U ) = . I X (P,,)'" di , subject to , I /*.. = 1 for ally . (1) 

/ = 1 7=1 ” " / — 1 ' 


In (1), L = (Aj A c ) is a C-tuple of prototypes, ctj is the distance of feature point Xj to cluster 

X-, N is the total number of feature vectors, C is the number of classes, and U = [p-] is a C xN 
matrix called the fuzzy C-partition matrix [9] satisfying the following conditions: 

C 

u... e [0,1] for all / and y, I /i 7 = 1 for ally, and 
y / = l v 

N 

0 < £ u... < N for all i. 

7 = l y 

Here, fi- is the grade of membership of the feature point Xj in cluster A,, and in e [!,«>) is a 

weighting exponent called the fuzzifier. In what follows, X- will also be used to denote the ith 
cluster, since it contains all of the parameters that define the prototype of the cluster. 

Simply relaxing the constraint in (1) produces the trivial solution, i. e., the criterion 
function is minimized by assigning all memberships to zero. Clearly, one would like the 
memberships for representative feature points to be as high as possible, while unrepresentative 
points should have low membership in all clusters. This is an approach consistent with possibility 
theory [11]. The objective function which satisfies our requirements may be formulated as: 
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where Tji are suitable positive numbers. The lust term demands that the distances from the feature 
vectors to the prototypes be as low as possible, whereas the second teim foices the f. l - to be as 

large as possible, thus avoiding the trivial solution. The following theorem, proved in [9], gives 
necessary conditions for minimization, hence, providing the basis for an iteiative algorithm. 

Theorem: 

Suppose that X = {* , , * 2 x n ) is a set ot f ‘ ealui ' e vectors, L = (Aj A c ) is a 

C-tuple of prototypes, is the distance of feature point Xj to the cluster prototype A ( , (i = 1, 
..., C; j = 1 AO, and U = [p y ] is a C xN matrix of possibilistic membership values. Then U 
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may be a global minimum tor J m (L,U) only if p/j = I 1 + I • The necessary 

"i 

conditions on the prototypes are identical to the corresponding conditions in the FCM and its 
derivatives. 

Thus, in each iteration, the updated value of fly depends only on the distance of Xj. from 

A,, which is an intuitively pleasing result. The membership of a point in a cluster should be 
determined solely by how far it is from the prototype of the class, and should not be coupled to its 
location with respect to other classes. The updating of the prototypes depends on the distance 
measure chosen, and will proceed exactly the same way as in the case of the FCM algorithm and its 
derivatives. 

The value of 77, determines the distance at which the membership value of a point in a 
cluster becomes 0.5 (i. e., "the 3 dB point"). Thus, it needs to be chosen depending on the desired 
"bandwidth" of the possibility (membership) distribution for each cluster. This value could be the 

same for all clusters, if all clusters are expected to be similar. In general, it is desirable that 77,- 

relates to the overall size and shape of cluster A/. Also, it is to be noted that 77, determines the 
relative degree to which the second term in the objective function is important compared to the First. 

If the two terms are to be weighted roughly equally, then 77, should be of the order of d-j . In 
practice we find that the following definition works best. 


N 
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This choice makes 77/ the average fuzzy intra-cluster distance of cluster A,. The value of 77 7 can be 

fixed for all iterations, or it may be varied in each iteration. When 77,- is varied in each iteration, care 
must be exercised, since it may lead to instabilities. Our experience shows that the final clustering 

is quite insensitive to large (an order of magnitude) variations in the values of r\ t . 

III. The Possibilistic C Piano-Quadric Shells Algorithm 

Suppose that we are given a second degree curve X- characterized by a prototype vector 
T 

Pi =\pil,pi2 Pirl 

to which it is desired to lit points x- obtained through the application of some edge detection 
T 

algorithm, contains the coefficients ot the second-degree curve that describes cluster /.. II a 
point jc has coordinates [.rj, ... , .r^J, then let 
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q = [x\, x\ x l n .x\X 2 vV( /? .i).r„,.vi,.r2. x iv H T • 


The equation of tlie second-degree curve that describes cluster i is given by q = 0. 


When the exact (geometric) distance has no closed-form solution, one of the methods 
suggested in the literature is to use what is known as the "approximate distance" which is the first- 
order approximation of the exact distance. It is easy to show [12] that the approximate distance of a 
point from a curve is given by 

JL 


/ Aij -dA 2 (Xj,Xi) 






ppDpjTpi ’ 


(4) 


where is the gradient of the distance functional 


Pi T q =[PiLPi2 Pir\U\, 


l /r 
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evaluated at x ; . In (4) the matrix D / is simply the Jacobian of q evaluated at x; . 

J 2 
One can easily reformulate the quadric shell clustering algorithm with d A - as the 

underlying distance measure. It was shown in [8] that the solution to the parameter estimation 
problem is given by the generalized eigenvector problem 

FiPi =liG iPi , (6) 


where 

N 

Fi = I jOly)"* Mj , 

T 

Mj = qj qj , and 

N 

Gi = DjDjT , 

which can be converted to the standard eigenvector problem if the matrix Gi is not rank-deficient. 
Unfortunately this is not the case. In fact, the last row of Dj is always [0, . . . ,0]. Equation (6) 
can still be solved using other techniques that use the modified Cholesky decomposition [13], and 
the solution is computationally quite inexpensive when the feature space is 2-D or 3-D. Another 
advantage of this constraint is that it can also fit lines and planes in addition to quadrics. Our 
experimental results show that the resulting algorithm, which we call the Possibilistic C Piano- 
Quadric Shells (PCPQS) algorithm, is quite robust in the presence of poorly defined boundaries (i. 
e., when the edge points are somewhat scattered around the ideal boundary curve in the 2-D case 
and when the range values are not very accurate in the 3-D case). It is also very immune to impulse 
noise and outliers. Of course, if the type of curves required are restricted to a single type, e.g., 
lines, or circles, or ellipses, simpler algorithms can be used with possibilistic updates, as will be 
seen. 
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IV. Determination of Number of Clusters 


The number of clusters C is not known a priori in some pattern recognition applications 
and most computer vision applications. When the number of clusters is unknown, one method to 
determine this number is to perform clustering for a range of C values, and pick the C value for 
which a suitable validity measure is minimized (or maximized) 114J. However this method is 
rather tedious, especially when the number ol clusters is large. Also, in our experiments, we found 
that the C value obtained this way may not be optimum. This is because when C is large, the 
clustering algorithm sometimes converges to a local minimum of the objective function, and this 
may tesult in a bad value tor the validity ot the clustering, even though the value of C is correct. 
Moreover, when C is greater than the optimum number, the algorithm may split a single shell 
cluster into mote than one cluster, and yet achieve a good value for the overall validity. To 
overcome these problems, we proposed in [8] an alternative Unsupervised C Shell Clustering 
algorithm which is computationally more ellicient, since it does not perform the clustering for an 
entire range of C values. 


Our proposed method progressively clusters the data starting with an overspecified number 
Cmax of clusters. Initially, the FCPQS algorithm is run with C=C lna x ■ After the algorithm 
converges, spurious clusters (with low validity) are eliminated; compatible clusters are merged; and 
points assigned to clusters with good validity are temporarily removed from the data set to reduce 
computations. The FCPQS algorithm is invoked again with the remaining feature points. The 
above procedure is repeated until no more elimination, merging, or removing occurs, or until 


V. Examples of Possibilistic Clustering for Shape Recognition 

Figures 1 and 2 show the detection of a circular "fractal edge" from a synthetically 
generated image. Figure 1(a) is the original composite fractal image; Figure 1(b) shows what a 
gray-scale edge operator finds (or doesn't find); figure 1(c) is the output of the horizontal fractal 
edge operator; with Figure 1(d) giving the maximum overall response of the fractal operators in 
four directions. Figure 2(a) depicts the (noisy) thresholded and thinned result from Figure 1(d). 
Figure 2(b) gives the final prototype found by the FPQCS (which, since there is only one cluster 
present, is the same as the crisp version). Note how the presence of noise distorts the final 
prototype. Figure 2(c) shows the possibilistic algorithm output, which is superimposed on the 
original image in Figure 2(d). The results of the PPQCS algorithm are virtually unaffected by 
noise. Several examples comparing crisp, fuzzy and possibilistic versions of clustering can be 
found in [6,8,10]. 6 


Figure 3 depicts the algorithm applied to the image of a model of the Space Shuttle. Figure 
3(a) is the original image. Figure 3(b) gives the output of a typical edge operator. Note that, due to 
the rather poor quality ot the original image, the edges found both noisy and incomplete. This data 
was then input into the possibilistic plano-quardic clustering algorithm. Figure 3(c) gives the eight 
complete prototypes which were found after running the algorithm. Finally, Figure 3(d) displays 
the prototype drawn only where sufficient edges points exist. 

VI. Conclusions 


In this paper, we demonstrated how our new possibilistic approach to objective-function- 
based clustering coupled with our plant.) - quadric shells algorithm can recognize first and second 
degree shapes from incomplete and noisy edge data. This approach is superior to both crisp and 
fuzzy clustering, as well as to traditional methods such as the Hough Transform. Extensions of 
this approach to other classes of shapes is currently underway. 
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Figure 1. Detection of a fractal circular edge. 

(a) Upper Left. Original fractal composite image. 

(b) Upper Right. Output of gray scale edge operator. 

(c) Lower Left. Output of "horizontal" fractal edge operator. 

(d) Lower Right. Results of Maximum magnitude of outputs of four directions of fractal operators. 
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Figure 2. Recognition of circular boundary. 

(a) Upper Left. Figure 1(d) thresholded and thinned. 

(b) Upper Right. Circular prototype found by fuzzy (or crisp) clustering. 

(c) Lower Left. Circular prototype found by possibilistic clustering. 

(d) Lower Right. Possibilistic prototype superimposed on original image. 
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Figure 3. Recognition of Shuttle model boundaries. 

(a) Upper Left. Original Shuttle image. 

(b) Upper Right. Incomplete and noisy edges found by edge operator. 

(c) Lower Left. Prototypes found by Possibilistic Piano-Quadric clustering. 

(d) Lower Right. Possibilistic prototypes superimposed drawn where there is sufficient edge 

information. 
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